AITopics | effective sparsity

Collaborating Authors

effective sparsity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Connectivity Matters: Neural Network Pruning Through the Lens of Effective Sparsity

Vysogorets, Artem, Kempe, Julia

arXiv.org Artificial IntelligenceApr-7-2023

Neural network pruning is a fruitful area of research with surging interest in high sparsity regimes. Benchmarking in this domain heavily relies on faithful representation of the sparsity of subnetworks, which has been traditionally computed as the fraction of removed connections (direct sparsity). This definition, however, fails to recognize unpruned parameters that detached from input or output layers of the underlying subnetworks, potentially underestimating actual effective sparsity: the fraction of inactivated connections. While this effect might be negligible for moderately pruned networks (up to 10 -100 compression rates), we find that it plays an increasing role for sparser subnetworks, greatly distorting comparison between different pruning algorithms. For example, we show that effective compression of a randomly pruned LeNet-300-100 can be orders of magnitude larger than its direct counterpart, while no discrepancy is ever observed when using Syn-Flow for pruning (Tanaka et al., 2020). In this work, we adopt the lens of effective sparsity to reevaluate several recent pruning algorithms on common benchmark architectures (e.g., LeNet-300-100, VGG-19, ResNet-18) and discover that their absolute and relative performance changes dramatically in this new, and as we argue, more appropriate framework. To aim for effective, rather than direct, sparsity, we develop a low-cost extension to most pruning algorithms. Further, equipped with effective sparsity as a reference frame, we partially reconfirm that random pruning with appropriate sparsity allocation across layers performs as well or better than more sophisticated algorithms for pruning at initialization (Su et al., 2020). In response to this observation, using an analogy of pressure distribution in coupled cylinders from thermodynamics, we design novel layerwise sparsity quotas that outperform all existing baselines in the context of random pruning.

artificial intelligence, machine learning, pruning, (16 more...)

arXiv.org Artificial Intelligence

2107.02306

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Generalized Linear Models with Structured Sparsity Estimators

Caner, Mehmet

arXiv.org Machine LearningApr-29-2021

In this paper, we introduce structured sparsity estimators in Generalized Linear Models. Structured sparsity estimators in the least squares loss are introduced by Stucky and van de Geer (2018) recently for fixed design and normal errors. We extend their results to debiased structured sparsity estimators with Generalized Linear Model based loss. Structured sparsity estimation means penalized loss functions with a possible sparsity structure used in the chosen norm. These include weighted group lasso, lasso and norms generated from convex cones. The significant difficulty is that it is not clear how to prove two oracle inequalities. The first one is for the initial penalized Generalized Linear Model estimator. Since it is not clear how a particular feasible-weighted nodewise regression may fit in an oracle inequality for penalized Generalized Linear Model, we need a second oracle inequality to get oracle bounds for the approximate inverse for the sample estimate of second-order partial derivative of Generalized Linear Model. Our contributions are fivefold: 1. We generalize the existing oracle inequality results in penalized Generalized Linear Models by proving the underlying conditions rather than assuming them. One of the key issues is the proof of a sample one-point margin condition and its use in an oracle inequality. 2. Our results cover even non sub-Gaussian errors and regressors. 3. We provide a feasible weighted nodewise regression proof which generalizes the results in the literature from a simple l_1 norm usage to norms generated from convex cones. 4. We realize that norms used in feasible nodewise regression proofs should be weaker or equal to the norms in penalized Generalized Linear Model loss. 5. We can debias the first step estimator via getting an approximate inverse of the singular-sample second order partial derivative of Generalized Linear Model loss.

estimator, geer, inequality, (15 more...)

arXiv.org Machine Learning

2104.14371

Country:

North America > United States > North Carolina (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Oracle inequalities for image denoising with total variation regularization

Ortelli, Francesco, van de Geer, Sara

arXiv.org Machine LearningNov-17-2019

We derive oracle results for discrete image denoising with a total variation penalty. We consider the least squares estimator with a penalty on the $\ell^1$-norm of the total discrete derivative of the image. This estimator falls into the class of analysis estimators. A bound on the effective sparsity by means of an interpolating matrix allows us to obtain oracle inequalities with fast rates. The bound is an extension of the bound by Ortelli and van de Geer [2019c] to the two-dimensional case. We also present an oracle inequality with slow rates, which matches, up to a log-term, the rate obtained for the same estimator by Mammen and van de Geer [1997]. The key ingredient for our results are the projection arguments to bound the empirical process due to Dalalyan et al. [2017].

artificial intelligence, imsart-generic ver, ortelli and van, (14 more...)

arXiv.org Machine Learning

1911.07231

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence (0.93)

Add feedback

Prediction bounds for (higher order) total variation regularized least squares

Ortelli, Francesco, van de Geer, Sara

arXiv.org Machine LearningOct-3-2019

We establish oracle inequalities for the least squares estimator $\hat f$ with penalty on the total variation of $\hat f$ or on its higher order differences. Our main tool is an interpolating vector that leads to upper bounds for the effective sparsity. This allows one to show that the penalty on the $k^{\text{th}}$ order differences leads to an estimator $\hat f$ that can adapt to the number of jumps in the $(k-1)^{\text{th}}$ order differences. We present the details for $k=2, \ 3$ and expose a framework for deriving the result for general $k\in \mathbb{N}$.

effective sparsity, imsart-generic ver, ortelli higher order, (14 more...)

arXiv.org Machine Learning

1904.10871

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On the Effect of Low-Rank Weights on Adversarial Robustness of Neural Networks

Langeberg, Peter, Balda, Emilio Rafael, Behboodi, Arash, Mathar, Rudolf

arXiv.org Machine LearningJan-29-2019

Recently, there has been an abundance of works on designing Deep Neural Networks (DNNs) that are robust to adversarial examples. In particular, a central question is which features of DNNs influence adversarial robustness and, therefore, can be to used to design robust DNNs. In this work, this problem is studied through the lens of compression which is captured by the low-rank structure of weight matrices. It is first shown that adversarial training tends to promote simultaneously low-rank and sparse structure in the weight matrices of neural networks. This is measured through the notions of effective rank and effective sparsity. In the reverse direction, when the low rank structure is promoted by nuclear norm regularization and combined with sparsity inducing regularizations, neural networks show significantly improved adversarial robustness. The effect of nuclear norm regularization on adversarial robustness is paramount when it is applied to convolutional neural networks. Although still not competing with adversarial training, this result contributes to understanding the key properties of robust classifiers.

adversarial training, classifier, robustness, (16 more...)

arXiv.org Machine Learning

1901.10371

Genre: Research Report (0.65)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback